neurips 2020
Author Rebuttal for NeurIPS 2020 Submission # 3770
We thank all the reviewers for their valuable comments and suggestions. We respond to the main concerns as follows. All the source code will be released to the community soon. Q: The authors should report the actual inference time/latency of their models. We evaluate the proposed method on a Tesla V100 GPU.
Supplementary Material Learning to Play Sequential Games versus Unknown Opponents Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause (NeurIPS 2020)
Our goal is to bound the learner's cumulative regret's are the actions chosen by the learner and In case we have k (,) L for some L> 0 then the result holds for L . 's according to the standard MW update algorithm which's, we follow the same proof steps as in proof of Theorem 1 to show that, with probability at least 1, the learner's regret can be bounded as R ( T) The corollary's statement then follows by observing that As discussed in Section 3.3, in a repeated Stackelberg game the decision Before bounding the leader' regret, recall that the algorithm resulting from Corollary 3 consists of In this section, we describe the experimental setup of Section 4.1. D ( y), (18) 16 Figure 3: Obtained rewards when the rangers know the poachers' model (OPT), use the proposed algorithm to update their patrol strategy online ( SU ( x, y) to maximize their own utility function. For the poachers' utility we use GP-UCB either converges to suboptimal solutions or displays a slower learning curve. In the case of more than one best response, ties are broken in an arbitrary but consistent manner.
Author Response - NeurIPS 2020:: Submission 11074 - " Almost Surely Stable Dynamics "
We thank the reviewers for their time and effort in providing helpful feedback. We address the comments in order. Reviewer #1 - No report submitted. This constrains the state transition densities that their models can express. Lyapunov networks and is readily applicable in both deterministic and stochastic settings. We will add variance info to our figures.
Scalable Rail Planning and Replanning with Soft Deadlines
Chen, Zhe, Li, Jiaoyang, Harabor, Daniel, Stuckey, Peter J.
The Flatland Challenge, which was first held in 2019 and reported in NeurIPS 2020, is designed to answer the question: How to efficiently manage dense traffic on complex rail networks? Considering the significance of punctuality in real-world railway network operation and the fact that fast passenger trains share the network with slow freight trains, Flatland version 3 introduces trains with different speeds and scheduling time windows. This paper introduces the Flatland 3 problem definitions and extends an award-winning MAPF-based software, which won the NeurIPS 2020 competition, to efficiently solve Flatland 3 problems. The resulting system won the Flatland 3 competition. We designed a new priority ordering for initial planning, a new neighbourhood selection strategy for efficient solution quality improvement with Multi-Agent Path Finding via Large Neighborhood Search(MAPF-LNS), and use MAPF-LNS for partially replanning the trains influenced by malfunction.
- North America > United States > California (0.14)
- Oceania > Australia (0.04)
- Research Report (0.40)
- Contests & Prizes (0.34)
PowRL: A Reinforcement Learning Framework for Robust Management of Power Networks
Chauhan, Anandsingh, Baranwal, Mayank, Basumatary, Ansuma
Power grids, across the world, play an important societal and economical role by providing uninterrupted, reliable and transient-free power to several industries, businesses and household consumers. With the advent of renewable power resources and EVs resulting into uncertain generation and highly dynamic load demands, it has become ever so important to ensure robust operation of power networks through suitable management of transient stability issues and localize the events of blackouts. In the light of ever increasing stress on the modern grid infrastructure and the grid operators, this paper presents a reinforcement learning (RL) framework, PowRL, to mitigate the effects of unexpected network events, as well as reliably maintain electricity everywhere on the network at all times. The PowRL leverages a novel heuristic for overload management, along with the RL-guided decision making on optimal topology selection to ensure that the grid is operated safely and reliably (with no overloads). PowRL is benchmarked on a variety of competition datasets hosted by the L2RPN (Learning to Run a Power Network). Even with its reduced action space, PowRL tops the leaderboard in the L2RPN NeurIPS 2020 challenge (Robustness track) at an aggregate level, while also being the top performing agent in the L2RPN WCCI 2020 challenge. Moreover, detailed analysis depicts state-of-the-art performances by the PowRL agent in some of the test scenarios.
- North America > United States (0.14)
- Europe > France (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Energy > Renewable (1.00)
- Energy > Power Industry > Utilities (0.46)
GitHub - facebookresearch/nle: The NetHack Learning Environment
The NetHack Learning Environment (NLE) is a Reinforcement Learning environment presented at NeurIPS 2020. NLE is based on NetHack 3.6.6 and designed to provide a standard RL interface to the game, and comes with tasks that function as a first step to evaluate agents on this new environment. NetHack is one of the oldest and arguably most impactful videogames in history, as well as being one of the hardest roguelikes currently being played by humans. It is procedurally generated, rich in entities and dynamics, and overall an extremely challenging environment for current state-of-the-art RL agents, while being much cheaper to run compared to other challenging testbeds. Through NLE, we wish to establish NetHack as one of the next challenges for research in decision making and machine learning.
- Education (0.83)
- Leisure & Entertainment > Games (0.37)
GitHub - mit-han-lab/tinyengine: [NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; MCUNetV3: On-Device Training Under 256KB Memory
This is the official implementation of TinyEngine, a memory-efficient and high-performance neural network library for Microcontrollers. TinyEngine is a part of MCUNet, which also consists of TinyNAS. MCUNet is a system-algorithm co-design framework for tiny deep learning on microcontrollers. TinyEngine and TinyNAS are co-designed to fit the tight memory budgets. We will soon release Tiny Training Engine used in MCUNetV3: On-Device Training Under 256KB Memory. If you are interested in getting updates, please sign up here to get notified!
AI Ethics Statements -- Analysis and lessons learnt from NeurIPS Broader Impact Statements
Ashurst, Carolyn, Hine, Emmie, Sedille, Paul, Carlier, Alexis
Ethics statements have been proposed as a mechanism to increase transparency and promote reflection on the societal impacts of published research. In 2020, the machine learning (ML) conference NeurIPS broke new ground by requiring that all papers include a broader impact statement. This requirement was removed in 2021, in favour of a checklist approach. The 2020 statements therefore provide a unique opportunity to learn from the broader impact experiment: to investigate the benefits and challenges of this and similar governance mechanisms, as well as providing an insight into how ML researchers think about the societal impacts of their own work. Such learning is needed as NeurIPS and other venues continue to question and adapt their policies. To enable this, we have created a dataset containing the impact statements from all NeurIPS 2020 papers, along with additional information such as affiliation type, location and subject area, and a simple visualisation tool for exploration. We also provide an initial quantitative analysis of the dataset, covering representation, engagement, common themes, and willingness to discuss potential harms alongside benefits. We investigate how these vary by geography, affiliation type and subject area. Drawing on these findings, we discuss the potential benefits and negative outcomes of ethics statement requirements, and their possible causes and associated challenges. These lead us to several lessons to be learnt from the 2020 requirement: (i) the importance of creating the right incentives, (ii) the need for clear expectations and guidance, and (iii) the importance of transparency and constructive deliberation. We encourage other researchers to use our dataset to provide additional analysis, to further our understanding of how researchers responded to this requirement, and to investigate the benefits and challenges of this and related mechanisms.
- South America (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Africa (0.04)
- (9 more...)
Study finds that few major AI research papers consider negative impacts
In recent decades, AI has become a pervasive technology, affecting companies across industries and throughout the world. These innovations arise from research, and the research objectives in the AI field are influenced by many factors. Together, these factors shape patterns in what the research accomplishes, as well as who benefits from it -- and who doesn't. In an effort to document the factors influencing AI research, researchers at Stanford, the University of California, Berkeley, the University of Washington, and University College Dublin & Lero surveyed 100 highly cited studies submitted to two prominent AI conferences, NeurIPS and ICML. They claim that in the papers they analyzed, which were published in 2008, 2009, 2018, and 2019, the dominant values were operationalized in ways that centralize power, disproportionally benefiting corporations while neglecting society's least advantaged.
Correcting Momentum in Temporal Difference Learning
Bengio, Emmanuel, Pineau, Joelle, Precup, Doina
A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itself changes due to bootstrapping. We first show that this phenomenon exists, and then propose a first-order correction term to momentum. We show that this correction term improves sample efficiency in policy evaluation by correcting target value drift. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.